**CS 3853: Computer Architecture**

Recitation-3

1. Consider the following code fragment:

Loop:

LW R1, 0(R2)

DADDI R3, R3, 1

SW R5, 0(R4)

DADDI R6, R6, -4

BNEZ R6, Loop

Consider the standard 5 stage pipeline machine (IF ID EX MEM WB). Assume the

initial value of R6 is 396 and all memory accesses hit in the cache. Show the timing

of the above code fragment for one iteration as well as for the load of the second

iteration.

For this part, assume there is **no forwarding or bypassing hardware**. Assume a

register write occurs in the first half of the cycle and a register read occurs in the

last half of the cycle. Also, assume that branches are resolved in the memory stage

and are handled by flushing the pipeline. Use a pipeline timing chart to show the

timing.

|  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| Ins | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
| LW R1, 0(R2) | F | D | X | M | W |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| DADDI R3, R3, 1 |  | F | D | X | M | W |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| SW R5, 0(R4) |  |  | F | D | X | M | W |  |  |  |  |  |  |  |  |  |  |  |  |  |
| DADDI R6, R6, -4 |  |  |  | F | D | X | M | W |  |  |  |  |  |  |  |  |  |  |  |  |
| BNEZ R6, Loop |  |  |  |  | F | S | S | D | X | M | W |  |  |  |  |  |  |  |  |  |
| LW R1, 0(R2) |  |  |  |  |  |  |  |  |  |  | F | D | X | M | W |  |  |  |  |  |

1. Consider the following code fragment:

Loop:

LW R1, 0(R2)

DADDI R1, R1, 1

SW R1, 0(R2)

DADDI R2, R2, 4

DADDI R4, R4, -4

BNEZ R4, Loop

Consider the standard 5 stage pipeline machine (IF ID EX MEM WB). Assume the

initial value of R6 is 396 and all memory accesses hit in the cache. Show the timing

of the above code fragment for one iteration as well as for the load of the second

iteration.

For this part, assume there is **no forwarding or bypassing hardware**. Assume a

register write occurs in the first half of the cycle and a register read occurs in the

last half of the cycle. Also, assume that branches are resolved in the memory stage

and are handled by flushing the pipeline. Use a pipeline timing chart to show the

timing.

|  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| Ins | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
| LW R1, 0(R2) | F | D | X | M | W |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| DADDI R1, R1, 1 |  | F | S | S | D | X | M | W |  |  |  |  |  |  |  |  |  |  |  |  |
| SW R1, 0(R2) |  |  |  |  | F | S | S | D | X | M | W |  |  |  |  |  |  |  |  |  |
| DADDI R2, R2, 4 |  |  |  |  |  |  |  | F | D | X | M | W |  |  |  |  |  |  |  |  |
| DADDI R4, R4, -4 |  |  |  |  |  |  |  |  | F | D | X | M | W |  |  |  |  |  |  |  |
| BNEZ R4, Loop |  |  |  |  |  |  |  |  |  | F | S | S | D | X | M | W |  |  |  |  |
| LW R1, 0(R2) |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  | F | D | X | M | W |

1. Consider the following code fragment:

Loop:

LW R1, 0(R2)

LW R3, 0(R1)

DADDI R3, R3, 1

SW R1, 0(R2)

DADDI R4, R4, -4

BNEZ R4, Loop

Show the timing sequence for the pipeline **with full forwarding and bypassing**

**hardware** (as discussed in class). Assume that branches are resolved in the MEM

stage and are predicted as not taken.

|  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| Ins | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
| LW R1, 0(R2) | F | D | X | M | W |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
| LW R3, 0(R1) |  | F | D | S | X | M | W |  |  |  |  |  |  |  |  |  |  |  |  |  |
| DADDI R3, R3, 1 |  |  | F | S | D | S | X | M | W |  |  |  |  |  |  |  |  |  |  |  |
| SW R1, 0(R2) |  |  |  |  | F | S | D | X | M | W |  |  |  |  |  |  |  |  |  |  |
| DADDI R4, R4, -4 |  |  |  |  |  |  | F | D | X | M | W |  |  |  |  |  |  |  |  |  |
| BNEZ R4, Loop |  |  |  |  |  |  |  | F | D | X | M | W |  |  |  |  |  |  |  |  |
| LW R1, 0(R2) |  |  |  |  |  |  |  |  |  |  |  | F | D | S | X | W |  |  |  |  |

1. We begin with a computer implemented in single-cycle implementation. When the stages are split by functionality, the stages do not require exactly the same amount of time. The original machine had a clock cycle time of 7 ns. After the stages were split, the measured times were IF, 1 ns; ID, 1.5 ns; EX, 1 ns; MEM, 2 ns; and WB, 1.5 ns. The pipeline register delay is 0.1 ns.

a. What is the clock cycle time of the 5-stage pipelined machine? The clock cycle time of slowest stage is the clock cyce time = 2ns Reg delay = 0.1 = 2.1

b. If there is a stall every 4 instructions, what is the CPI of the new

machine? Cpi = clock cyckle/instr. Cpi = 5 cycles/ 4 instruction delay = 1.25

c. What is the speedup of the pipelined machine over the single

cycle machine?cpuor |\*7\*1 = 7|

cpupipiline = | \* 1.25 \* 2.1 = 2.6251

speedup 7|/ 2.626|

= 2.67

d. If the pipelined machine had an infinite number of stages, what would its

speedup be over the single-cycle machine? There is no Stall.

Cpupipe = | \* 1 \* 0.1 = 0.1|

Speedup= 70